Goto

Collaborating Authors

 global optima


Mixed Linear Regression with Multiple Components

Neural Information Processing Systems

In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude.





Neural Temporal-Difference Learning Converges to Global Optima

Neural Information Processing Systems

Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD. Beyond policy evaluation, we establish the global convergence of neural (soft) Q-learning, which is further connected to that of policy gradient algorithms.


An improved clustering-based multi-swarm PSO using local diversification and topology information

Matanga, Yves, Sun, Yanxia, Wang, Zenghui

arXiv.org Artificial Intelligence

Multi-swarm particle optimisation algorithms are gaining popularity due to their ability to locate multiple optimum points concurrently. In this family of algorithms, clustering-based multi-swarm algorithms are among the most effective techniques that join the closest particles together to form independent niche swarms that exploit potential promising regions. However, most clustering-based multi-swarms are Euclidean distance-based and only inquire about the potential of one peak within a cluster and thus can lose multiple peaks due to poor resolution. In a bid to improve the peak detection ratio, the current study proposes two enhancements. First, a preliminary local search across initial particles is proposed to ensure that each local region is sufficiently scouted prior to particle collaboration. Secondly, an investigative clustering approach that performs concavity analysis is proposed to evaluate the potential for several sub-niches within a single cluster. An improved clustering-based multi-swarm PSO (TImPSO) has resulted from these enhancements and has been tested against three competing algorithms in the same family using the IEEE CEC2013 niching datasets, resulting in an improved peak ratio for almost all the test functions.


Mixed Linear Regression with Multiple Components

Neural Information Processing Systems

In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude.


Implicit Regularization in Matrix Factorization

Suriya Gunasekar, Blake E. Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nati Srebro

Neural Information Processing Systems

This generalization ability cannot be explained by the capacity of the explicitly specified model class (namely, the functions representable in the chosen architecture). Instead, it seems that the optimization algorithm biases us toward a "simple" model, minimizing



The non-convex Burer-Monteiro approach works on smooth semidefinite programs Nicolas Boumal

Neural Information Processing Systems

Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via rank-restricted, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-convex surrogates reliably. Although some theory supports this empirical success, a complete explanation of it remains an open question. In this paper, we consider a class of SDPs which includes applications such as max-cut, community detection in the stochastic block model, robust PCA, phase retrieval and synchronization of rotations. We show that the low-rank Burer-Monteiro formulation of SDPs in that class almost never has any spurious local optima. This paper was corrected on April 9, 2018. Theorems 2 and 4 had the assumption that M (1) is a manifold.